PDX-481: fix(prompts): rewrite author-test flow to single-call construction#173
Conversation
…uction RCA: PR #153 (PDX-479 regression) shipped two artifacts — the provar.guide.orchestration prompt's author-test flow and the PROVAR_TOOL_GUIDE.md "I want to write a new test" section — that both steered LLMs toward generate-empty-then-step_edit-per-step authoring. PDX-480 confirmed locally that disabling these restores correct generation. This patch keeps the helpful guidance but rewrites it to recommend a single provar_testcase_generate call carrying the full step tree; step_edit is now explicitly marked amend-only. Fix: Rewrote the author-test flow in src/mcp/prompts/guidePrompts.ts and the matching section in docs/PROVAR_TOOL_GUIDE.md to mandate single-call construction. Split the orchestration prerequisite graph so provar_testcase_generate and provar_testcase_step_edit appear as distinct entry points with construct-vs-amend annotations. Added construct-vs-amend callouts to docs/mcp.md tool sections. Added pilot-guide Scenario 12 covering the multi-scenario single-call expectation. Added unit tests asserting the canonical phrasing (and absence of "repeat per step") plus a multi-scenario snapshot test that drives 12 steps through provar_testcase_generate in one call and asserts consecutive testItemIds, preserved scenario markers, consistent assert API IDs, and correct UiWithScreen nesting. Added scripts/pdx-481-validate.cjs for direct JSON-RPC verification of the fix at the protocol surface.
Quality Orchestrator🟢 LOW · 🧪 Tests to Run · Running 1 of 51 tests
▶ Run commandnpx vitest run \
unit/mcp/guidePrompts.test.ts⚡ quality-orchestrator · |
There was a problem hiding this comment.
Pull request overview
Rewrites the MCP author-test orchestration guidance and related documentation to mandate single-call test case construction via provar_testcase_generate (passing the full steps[] tree at once), and marks provar_testcase_step_edit as amend-only. This addresses the PDX-479 regression where step-by-step authoring caused dropped scenarios, flat asserts, and inconsistent step types. No production logic in testCaseGenerate.ts is modified — the change is to guidance copy plus regression-guard tests and a JSON-RPC validation script.
Changes:
- Rewrite
author-testflow inguidePrompts.tsandPROVAR_TOOL_GUIDE.md; splitgeneratevsstep_editin the prerequisite graph anddocs/mcp.mdcallouts; add Scenario 12 to the pilot guide. - Add new test suites:
guidePrompts.test.ts(registration + canonical phrasing) and a multi-scenario regression block intestCaseGenerate.test.ts. - Add
scripts/pdx-481-validate.cjsJSON-RPC harness to verify the rewritten copy and prerequisite-graph split via the MCP protocol surface.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/mcp/prompts/guidePrompts.ts | Rewrites the author-test flow steps and splits generate/step_edit entries in the prerequisite graph. |
| docs/PROVAR_TOOL_GUIDE.md | Rewrites "I want to write a new test" section with single-call construction guidance and amend-only step_edit usage. |
| docs/mcp.md | Adds construct-pattern callout to provar_testcase_generate and amend-only callout to provar_testcase_step_edit. |
| docs/mcp-pilot-guide.md | Adds Scenario 12 covering multi-scenario single-call construction for pilot evaluation. |
| test/unit/mcp/guidePrompts.test.ts | New unit suite verifying prompt registration and canonical author-test phrasing. |
| test/unit/mcp/testCaseGenerate.test.ts | Adds a "multi-scenario single-call construction" describe block with 4 regression-guard tests. |
| scripts/pdx-481-validate.cjs | New JSON-RPC validation harness asserting the rewritten copy is reachable and contains canonical phrasing. |
Comments suppressed due to low confidence (1)
test/unit/mcp/guidePrompts.test.ts:136
- The regex relies on default
|precedence which makes itprovar_testcase_generate[^\n]*constructORconstruct[^\n]*provar_testcase_generate. While this works, the unbounded[^\n]*can match across many tokens within the same line, so this could pass even if the words happen to appear on the same line in an unrelated context (e.g. another tool name between them). Consider tightening the regex (e.g. a bounded character window or matching specific punctuation like→/() to ensure the test actually verifies thatprovar_testcase_generateis annotated as the construct entry point.
assert.ok(
/provar_testcase_generate[^\n]*construct|construct[^\n]*provar_testcase_generate/i.test(text),
'prerequisite graph must annotate provar_testcase_generate as the construct entry point'
);
assert.ok(
/provar_testcase_step_edit[^\n]*amend|amend[^\n]*provar_testcase_step_edit/i.test(text),
'prerequisite graph must annotate provar_testcase_step_edit as the amend entry point'
);
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| !text.includes('repeat as needed') || text.includes('amend'), | ||
| 'author-test flow must not say "repeat as needed" without also clarifying step_edit is for amendments only' |
There was a problem hiding this comment.
Fixed in 706cb9f. The OR clause !text.includes("repeat as needed") || text.includes("amend") short-circuited to true on every run because "amend" appears throughout the rewritten author-test copy. Replaced with the unconditional !text.includes("repeat as needed") — the amend-only contract is already covered by the dedicated "author-test flow marks step_edit as amendment-only" test on line 113.
| let pass = 0; | ||
| let fail = 0; | ||
| for (const r of results) { | ||
| console.log(`${r.ok ? '[PASS]' : '[FAIL]'} ${r.label} — ${r.detail}`); | ||
| r.ok ? pass++ : fail++; | ||
| } | ||
| console.log(`\nPDX-481 validation: ${pass} passed, ${fail} failed`); | ||
|
|
||
| server.stdin.end(); | ||
| process.exit(fail > 0 ? 1 : 0); | ||
| })().catch((err) => { | ||
| console.error('Validation script error:', err); | ||
| server.kill(); | ||
| process.exit(2); |
There was a problem hiding this comment.
Partially addressed in 706cb9f. Replaced the ternary statement r.ok ? pass++ : fail++; with an explicit if/else block. On the cleanup concern: server.kill() is already called in the failure path (the script ends with .catch((err) => { console.error(...); server.kill(); process.exit(2); }) — see lines 152-156). That kills the child process before exit on rpc timeout / aborted handshake / any thrown error. Going further (synchronous wait for exit event before process.exit) would add complexity for a script that always exits immediately afterward and is only invoked as a one-shot validator, so leaving cleanup as-is.
RCA: Copilot review flagged two real defects in the PDX-481 regression-guard test and validation harness: (1) the "repeat as needed" assertion in guidePrompts.test.ts had an OR-clause that short-circuited to true because "amend" appears repeatedly elsewhere in the flow, making the assertion a no-op; (2) the prerequisite-graph regex used unbounded [^\n]* so it could match unrelated tokens between the two words on the same line; and one style nit on validate.cjs using a ternary as a statement. Fix: Made the "repeat as needed" assertion unconditional so it actually protects against the anti-pattern phrasing being reintroduced. Tightened the prerequisite-graph regex to require the exact annotation punctuation (provar_testcase_generate\s*\(construct / provar_testcase_step_edit\s*\(amend) so it cannot pass on unrelated text. Replaced the ternary-as-statement counter in pdx-481-validate.cjs with an if/else block. Also added scripts/pdx-481-trace.cjs (the JSON-RPC trace harness used to capture the patched-vs-unpatched prompt-flow side-by-side that was posted to PDX-479 as concrete regression evidence). All gate checks pass: 1118 mocha tests, lint clean, validation 9/9.
Copilot review follow-ups — addressed in
|
| # | File | Copilot finding | Action |
|---|---|---|---|
| 1 | test/unit/mcp/guidePrompts.test.ts:106 |
!text.includes('repeat as needed') || text.includes('amend') short-circuits to true ("amend" appears throughout the copy), making the assertion a no-op |
✅ Fixed — unconditional !text.includes('repeat as needed'). The amend-only contract is still covered by the dedicated test on line 113 |
| 2 | scripts/pdx-481-validate.cjs:152 |
r.ok ? pass++ : fail++; ternary-as-statement |
✅ Fixed — explicit if (r.ok) pass++; else fail++; block |
| 2 (cont.) | same | Server cleanup on timeout/abort | ❌ Already handled — the .catch on the IIFE (lines 152-156) calls server.kill() before process.exit(2). Going further to wait for exit event would over-engineer a one-shot validator |
| 3 (suppressed-low-confidence) | test/unit/mcp/guidePrompts.test.ts:130, 134 |
Unbounded [^\n]* regex could match unrelated tokens between the two words on the same line |
✅ Fixed — tightened to provar_testcase_generate\s*\(construct and provar_testcase_step_edit\s*\(amend, anchoring on the actual (annotation) punctuation emitted by the prerequisite-graph copy |
Gate after fixes
yarn compile→ cleannode_modules/.bin/nyc node_modules/.bin/mocha "test/**/*.test.ts"→ 1118 passing, 0 failingyarn lint→ cleannode scripts/pdx-481-validate.cjs→ 9 / 9 PASS
Bonus inclusion
The follow-up commit also adds scripts/pdx-481-trace.cjs — the JSON-RPC trace harness used to capture the patched-vs-unpatched prompt-flow side-by-side that was posted to PDX-479 as concrete regression evidence. Useful for any reviewer who wants to re-verify locally:
node scripts/pdx-481-trace.cjs
Drives the patched server and dumps the exact bytes an MCP client surfaces to its LLM for: the orchestration prompt's author-test flow (TRACE 1), the tool-guide resource (TRACE 2), both tool descriptions (TRACEs 3 + 4), and a real multi-scenario provar_testcase_generate call (TRACE 5).
Summary
provar.guide.orchestrationauthor-testflow and thePROVAR_TOOL_GUIDE.md"I want to write a new test" section to mandate single-call construction of test cases (fullsteps[]tree in oneprovar_testcase_generatecall)provar_testcase_step_editas amend-only — never for constructing new cases from an empty skeletongenerate(construct) andstep_edit(amend) appear as distinct entry pointsdocs/mcp.mdand adds pilot-guide Scenario 12 so the regression class is covered in evaluationpdx-481-validate.cjsfor protocol-surface verificationJira
Why
The 1.5.0 regression (PDX-479) traced to two artifacts shipped in PR #153 that both steered LLMs toward a
generate-empty + step_edit-per-stepauthoring pattern. That pattern caused scenarios to be dropped, asserts to be emitted as flat siblings instead of nested underUiWithScreenclauses, and assert API IDs to drift across the case. PDX-480 mechanically + behaviorally confirmed disabling those artifacts restores correct generation. This PR keeps the helpful guidance but rewrites it to the correct pattern.Test plan
yarn compilecleannode_modules/.bin/nyc node_modules/.bin/mocha "test/**/*.test.ts"— 1118 passing, 0 failing (12 new tests from this PR)yarn lintcleannode scripts/pdx-481-validate.cjs— 9 / 9 PASS: author-test flow contains "single call", "ALL steps", "amend"; excludes "repeat per step"; prerequisite graph split confirmed; tool-guide resource serves the rewritten copynode scripts/mcp-smoke.cjs— not run; environmental (global sf not linked to local worktree plugin) — verified the same blocker affects the develop tip, not introduced by this PRChanges
src/mcp/prompts/guidePrompts.ts: rewrote'author-test'flow + split thegenerate/step_editlines in the prerequisite graphdocs/PROVAR_TOOL_GUIDE.md: rewrote "I want to write a new test" sectiondocs/mcp.md: added construct-pattern callout onprovar_testcase_generate; added amend-only callout onprovar_testcase_step_editdocs/mcp-pilot-guide.md: added Scenario 12 "Construct a Multi-Scenario Test Case in a Single Call"test/unit/mcp/guidePrompts.test.ts(new): registration + canonical-phrasing assertions for the author-test flow and the prerequisite graphtest/unit/mcp/testCaseGenerate.test.ts: added "multi-scenario single-call construction (PDX-481 regression guard)" describe block with 4 tests (12-step payload sequential testItemIds; scenario markers preserved; assert API ID consistency; non-SF target_uri nests correctly)scripts/pdx-481-validate.cjs(new): JSON-RPC validation harness covering 9 protocol-surface assertionsOut of scope
testCaseGenerate.ts/testCaseStepTools.ts/testCaseValidate.tsare unchanged — they were byte-identical to the known-good 1.5.0-beta.17 buildCustomer-facing docs
docs/provar-mcp-public-docs.mdanddocs/university-of-provar-mcp-course.mdare maintained separately by the Provar team and may need a mirror update if they describe the authoring sequence.🤖 Generated with Claude Code